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Abstract 

Space-filling curves can be used to organise points in the plane into bounding-box hier- 
archies (such as R-trees). We develop measures of the bounding-box quality of space-filling 
curves that express how effective different space-filling curves are for this purpose. We give 
general lower bounds on the bounding-box quality measures and on locality according to 
Gotsman and Lindenbaum for a large class of space-filling curves. We describe a generic 
^ algorithm to approximate these and similar quality measures for any given curve. Using 

our algorithm we find good approximations of the locality and the bounding-box quality of 

04 several known and new space-filling curves. Surprisingly, some curves with relatively bad 

locality by Gotsman and Lindenbaum's measure, have good bounding-box quality, while the 

__ curve with the best-known locality has relatively bad bounding-box quality. 

o 

^ 1 Introduction 

a space-filling curve is a continuous, surjective mapping from M to M'*. It was not always clear 

that such a mapping would exist for d > 1, but in the late 19th century Peano showed that it 

^ is possible for d = 2 and d = 3 [26i. Since then, quite a number of space-filling curves have 

C ■ appeared in the literature. Sagan wrote an extensive treatise on space-filling curves [27| , which 

discusses most curves included in our study. During the early days space-filling curves were 

T^lj- primarily seen as a mathematical curiosity. Today however, space- filling curves are applied in 

\^ areas as diverse as load balancing for grid computing, colour space dimension reduction, small 

antenna design, I/O-efficient computations on massive matrices, and the creation of spatial 

data indexes. In this paper, we focus on the application of space-filling curves to the creation 

of query-efficient spatial data indexes such as R-trees. 
> 

^ Bounding-box hierarchies We consider the following type of spatial data indexes for points 

5^ in the plane. The data points are organised in blocks of at most B points, for some parameter B, 

such that each point is stored in one block. With each block we associate a bounding box, which 
is the smallest axis-aligned rectangle that contains all points stored in the block. The block 
bounding boxes are then organised in an index structure. Intersection queries are answered 
as follows: to find all points intersecting a query window Q, we query the index structure for 
all bounding boxes that intersect Q; then we retrieve the corresponding blocks, and check the 
points stored in those blocks one by one. To find the nearest neighbour to a query point q, one 
can use the index to search blocks in order of increasing distance from q. Thus one retrieves 
exactly the blocks whose bounding boxes intersect the largest empty circle around q. 
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Figure 1: (a) Leaves of an R-tree with B = 3. (b) Measuring locality for a particular curve 
section. L2-locality ratio between p and q = squared Euclidean distance between p and q, divided 
by the area covered by the curve section between p and q: (6^ + 5^) /87 ~ 0.70. Bounding-box 
area ratio between p and q = area of the bounding-box of the curve section S between p and 
q, divided by the area covered by S: 12 • 12/87 ~ 1.66. (WBA is the maximum over all pairs p 
and q) 

An R-tree [19] is an example of the type of structure described above: the blocks constitute 
the leaves of the tree, and the higher levels of the tree act as an index structure for the block 
bounding boxes. In practice the query response time is mainly determined by the number of 
blocks that need to be retrieved: this is because the bounding box index structure can often be 
cached in main memory, while the blocks (leaves) with data points have to be stored on slow 
external memory (for example a hard disk needing 10 ms for each seek). 

An R-tree is not uniquely defined by a set of data points. Any distribution of the input 
points over the leaves (blocks) may be used as the basis of an R-tree, as long as each point is 
put in exactly one block, and each block contains at most B points. One way of making the 
distribution is by ordering the input points along a space-filling curve [15] and then putting 
each next group of B points together in a block (see Figure [T|a)). 

Since the number of blocks retrieved to answer a query is simply the number of bounding 
boxes intersected, it is important that the ordering induced by the space-filling curve makes us 
fill each block with points that lie close to each other and thus have a small bounding box. In fact 
we can make some more precise observations for intersection queries with query windows that 
are points or lines. For point-intersection queries we observe that if the data and query points 
lie within a square of area 1, the average number of blocks retrieved for uniformly distributed 
point queries is simply the total area of the bounding boxes. For line- intersection queries with 
uniformly distributed orientation (between and vr) and signed distance from the centre of 
the square (in an interval containing at least [— |-v/2, 5\/2]), the chance of any particular block 
being retrieved is proportional to w{(j))d(l), where w{(j)) is the width of the bounding box in 
the orthogonal projection on a line with orientation (j). By the Crofton formula Jq w{(p)d(f) is 
simply the perimeter of the bounding box of the block, so for uniformly distributed line queries, 
the average number of blocks retrieved is proportional to the total perimeter of the bounding 
boxes. Therefore our goal is to have bounding boxes with small (total) area and small (total) 
perimeter. 

Our results We investigate which space-filling curves best achieve the above-mentioned goal: 
sorting points into bounding boxes with small (total) area and small (total) perimeter. To this 
end we propose new quality measures of space-filling curves that express how effective different 
space- filling curves are in this context. We also provide an algorithm to compute approximations 
of these and similar quality measures for any given curve. We used this algorithm to compute 
approximations of known measures of so-called curve-to-plane locality and of our new bounding- 
box quality measures for several well-known and new space- filling curves. 
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The known locality measures considered are the maximum, over all contiguous sections S 
of a space-filling curve, of the squared Loo-, L2- or Li-distance between the endpoints of S 
divided by the area covered by S (studied by Gotsman and Lindenbaum [10] and many other 
authors P El El HZl EH |25]), see Figure [T|b) for an example. 

Our first new measure is the maximum, over all contiguous sections of a space-filling curve, 
of the area of the bounding box of S divided by the area covered by S. We call this measure 
the worst-case bounding-box area ratio (WBA, Figure [Tj^b) ) . Our second new measure considers 
l/16th of the squared perimeter instead of the area, and we call it worst-case bounding-box 
squared perimeter ratio (WBP). 

We prove that WBA and WBP are at least 2 for a large class of space- filling curves. We also 
show that this class of curves has L2-locality at least 4, thus complementing earlier results by 
Niedermeier et al. |2jj who proved this for another class of space-filling curves (more restricted 
in one way, more general in another way). 

We found that Peano's original curve achieves a WBA-value of less than 2.0001; the exact 
value is probably exactly 2, which is optimal for this class of curves. Other well-known curves 
have WBA-values ranging from 2.400 to 3.000. However, on the WBP measure Peano's curve 
is not that good, with WBP = 2.722. Considering both WBA and WBP, the best curve we 
found in the hterature is the /3J^-curve [28 , with WBA = 2.222 and WBP = 2.250. However, in 
this paper we present a new variation on Peano's curve with even better scores: a WBA-value 
of 2.000 and a WBP-value of 2.155. This variation also performs very well on L^o-, L2- and 
Li-locality. 

Both WBA and WBP consider the worst case over all possible subsections of the curve. 
However, in the context of our application, it may be more relevant to study the total bounding 
box area and perimeter of a set of disjoint subsections of the curve that together cover the 
complete curve. We can argue that in the limit, we may have the worst case for all subsections 
in such a cover, but this seems to be unlikely to happen in practice. Therefore we study the 
total bounding box area and perimeter of random subdivisions of the curve into subsections. 
Here we find that many curves perform roughly equally well, but those with particularly bad 
WBA- or WBP-values, such as the Sierpihski-Knopp curve [27] or H-order [21], the AR'^W'^- 
curve |j4 , or Peano's original (unbalanced) curve (regarding bounding box perimeters), are 
clearly suboptimal in this sense. 

We also estimate the total diameter of the subsections in random subdivisions of the curve 
and present results for octagonal bounding boxes rather than rectangular bounding boxes. 

Below we first explain how different space-filling curves can be described and how they can 
be used to order points. We also describe some new curves. Next we define the locality and 
bounding-box quality measures, and prove lower bounds. After that we present our approxi- 
mation algorithm, and present the results of our computations. 

2 Describing and using space-filling curves 

There are many ways to define space-filling curves, for example algebraic, like in Peano's pa- 
per [26], or by describing an approximation of the curve by a polyline, with a rule on how to 
refine each segment of the approximation recursively [9]; many authors do this by specifying 
the regions filled by sections of the curve together with the location of the endpoints of such 
sections on the region boundaries (for example [H |271 [28] ) . Since we are concerned with the use 
of space-filling curves as a way to order points in the plane, we choose a method to describe 
space-filling curves that is based on defining how to order the space inside a (usually square) 
unit region. We will see later how such a description is also a description of a curve. 
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2.1 How to define and use a scanning order 

We define an order (scanning order) -< of points in tlie plane as follows. We give a set of rules, 
each of which specifies (i) how to subdivide a region in the plane into subregions; (ii) what is the 
order of those subregions; and (iii) for each subregion, which rule is to be applied to establish 
the order within that subregion. We also specify a unit region of area 1 for each order (usually 
the unit square), and we indicate what rule is used to subdivide and order it. Technically it 
would be possible to extend the orders to the full plane, but for simplicity we rather assume 
that all data that should be ordered is first scaled to lie within the unit region. 

The definitions of the scanning orders discussed in this paper are shown in Figure |2j Each 
rule is identified by a letter, and pictured by showing a region, its subdivision into subregions, 
the scanning order of the subregions (by numbers {0,1,2,...}), and the rules applied to the 
subregions (by letters). Variations of rules that consist of simply rotating or mirroring the 
order of and within subregions, are indicated by rotating or mirroring the letter identifying that 
rule. Variations that consist of reversing the order of and within the subregion are indicated by 
an overscore (Figure [2][^k,l,m)) — making such reversals explicit is the main difference between 
our notation and the notation of, for example, Asano et al. |4j or Wierum [28 . 

We can now see how we can implement a comparison operator that allows us to sort points 
according to a given scanning order. To decide whether p precedes q in the order, we determine 
in which subregions of the unit region p and q lie. If they are in different regions, p precedes 
q if and only if p lies in the lowest-numbered region of the two. If p and q lie in the same 
region, we compare them according to the rule for that subregion recursively. Ambiguity on 
the region boundaries can be resolved as follows: horizontal and diagonal region boundaries are 
always assumed to be included in the region above them; vertical boundaries are assumed to 
be included in the region to the right. 

Each drawing in Figure |2] includes a curve that roughly indicates the scanning order within 
the subdivisions. To obtain an arbitrarily fine approximation of a space- filling curve corre- 
sponding to a given scanning order, we may compute the subdivision of the unit region into 
subregions recursively to the desired depth of recursion, and connect the centre points of the 
resulting subregions by a polygonal curve in the order specified by the rules. Figure [2] includes 
a small example for each scanning order. 

In the rest of this paper, whenever we write "space- filling curve", what we really mean is 
the scanning order that defines it. 



2.2 The curves 

The traditional scanning orders considered in this paper are the following. 

• GP-order, producing the space-filling curve described in detail by Giuseppe Peano [261 [22] 
(Figure [2|^a)). We call this order GP-order instead of Peano order to avoid confusion with 
other curves that have also been referred to as the Peano curve by other authors. 

Sierpinski-Knopp- order, producing the Sierpihski-Knopp curve p7] (Figure gh)). It or- 



ders triangular regions, and can be used to order points as described in Section [2.1 
Niedermeier et al. [M] describe how to use it to order squares in a 2^^ x 2^ grid for any 
€ N, their variation is called H-order. For all purposes in our paper, Sierpihski-Knopp 
order and H-order are equivalent. 

Hubert order, producing Hilbert's curve [12] (Figure [2|^i) ) . One could say that Peano had 
already suggested that such a curve would exist ^26^ . 
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Figure 2: Space- filling curve definitions and example approximating polylines. 

Z-order, which follows a space-filling curve by Lebesgue |16j (Figure [2]^j)); Morton is 
credited with introducing Z-order to computer science 



Gosper flowsnake order, producing the Gosper curve, also known as the flowsnake f2l 
(Figure [2|^m)). It involves scaling with a factor l/\/7 and rotations with angles of 
{0, Itt, Itt} — arctan g V3. 



In addition to the above we include a number of variations on these orders. Wunderlich |30] 
defined a class of orders that satisfy certain restrictions, including: 

• simplicity: the order is defined by only one rule, and transformed versions of it; 

• order-preservation: transformations are rotations and/or reflections (no reversals); 

• edge- connectivity: considering the set of regions obtained by applying the rule to any 
depth of recursion, we find that any two consecutive regions in the order share an edge; 
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• uniformity: all subregions have the same size. 

Wunderlich categorises all different simple, order-preserving, edge-connected, uniform orders 
based on subdividing the unit square in a grid of 2 x 2 or 3 x 3 squares (modulo rotations, 
reflections and reversals). There is only one such order on a 2 x 2 grid (Hilbert order), and 
there are only 273 on a 3 x 3 grid: one which we call R-order^ (Figure [2|f)), and 272 different 
orders that put the nine grid cells in the order of the 'serpentine' pattern of the GP-order (these 
differ in the rotations and reflections of the cells). We investigated all of these orders to some 
extent, and based on the results we studied some of them in more depth, particularly those 
depicted in Figures [2]^a)-(e). By Wunderlich's numbering scheme these orders are Serpentine 
orders 000 000 000, 011010110, 101010101, 110110110, and 111 111 111, respectively. The first 
of these is simply GP-order. Luxburg [T7] examined the third order calling it Variation 2; we 
call the next order Meurthe order after the river in whose watershed we first presented it, and 
we call the last order coil order (a variation also commonly found on the Internet, and studied 
by Luxburg as Variation 1). We explain in Section [6] why we chose exactly these orders. 

When some of the above-mentioned restrictions are dropped, more orders are possible. An 
example from the literature is Wierum's pH.-order'^ [28], which is not simple, and unlike the 
other orders, it does not start and end in the corners of the unit square (Figure [2]^k) ) . Another 
example is the AR^W^-ordei which is not simple and not edge-connected, but still vertex- 
connected (Figure |2|^1)). It was designed to have the special property that any axis-parallel 
square query window can be covered by three contiguous sections of the curve that together 
cover an area of at most a constant times that of the query window [4, — from most other curves 
in our study one would need at least four sections to get such a constant-ratio cover. 

Most orders discussed so far cannot be scaled in only one dimension, because their definitions 
involve rotations. For GP-order this is not the case. In fact, as we will see later, a scaling in the 
horizontal dimension by a factor \/3 results in an order with much better locality properties. 
We call the scaled order balanced GP-order (Figure |2|g) ) . 

3 Quality measures for space-filling curves 

Several locality measures, or more generally, quality measures of space- filling curves have been 
considered in the literature. These include: 

• bounds on the (average) distance between two points along the curve as a function of their 
distance in the plane [51 [201 [29] (non-trivial worst-case bounds are not possible pO] ): 

• bounds on the (worst-case or average) distance between two points in the plane as a 
function of their distance along the curve [7] [TOl [21] ; 

• bounds on the (worst-case or average) number of contiguous sections of the curve that is 
needed to cover an axis-parallel query window, without covering too much space outside 
the query window [il W\ [Til |2T]: 

• bounds on the (worst-case or average) perimeter or diameter of sections of the curve as a 
function of their area |13| |28] . 

^Wunderlich calls it Meander, but because other authors have used that name to describe other orders, we 
rather use R-order. 

^For simplicity we take an Sl-shaped section of the curve. Wierum adds a special rule for the unit square 
so that he gets a closed (cyclic) curve, but for the purposes of our discussion this would be an unnecessary 
complication. 
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Not all methods of analysis considered in the literature can easily be applied to compare all 
curves in our study. Some calculations or experiments in the literature are based on how the 
scanning order sorts the points of a regular square grid whose size is a fixed integral power of 
the number of subregions in the rule(s) that define(s) the order. Such measures may vary with 
the grid size, which prevents a fair comparison between, for example, GP-order (for which we 
would have to use a grid size that is a power of nine) and Hilbert order (for which we would 
have to use a grid size that is a power of four) . To enable a comparison between a broad range 
of curves, we need measures that can be computed efficiently for large grids and converge when 
the grid size goes to infinity. 



3.1 Notation 

Before we can discuss and analyse quality measures for space-filling curves in detail, we need 
to introduce some notation. For ease of writing, we assume for now that if a scanning order is 
defined by more than one rule, then each rule contains the same number of subregions. 

A rule of a scanning order defines how to subdivide a unit region C of size (area) 1 into n 
subregions, numbered 0, ...,n — 1. The scanning order inside subregion i is given by applying a 
transformation r(z) to the unit region C and the way it is ordered by the ordering rule. For any 
base-n number a we use a' to denote its first digit, and a" to denote the remaining digits. We use 
C{a) as a shorthand for T{a'){C{a")), where C(0) = C. For example, C(538) is subregion 8 of 
subregion 3 of subregion 5, and it is found by applying transformation r(5) to C(38). Similarly, 
we use r(a) as a shorthand for T(a') o r(a"), where r(0) is the identity transformation. 

By 1^1 we denote the size of a region A. We have < |C(z)| < 1 for all < i < n (there are 
no empty subregions in the rules), and X]o<i<n 1^(01 = 1^*1 = 1- 

Let Nk denote the set of A;-digit base-n numbers. We write a X 6 if, in base-n notation, a 
and b have the same number of digits and a < b. By C(^ b) we denote the union of subregion b 
and its predecessors, that is, U^l'o^ C(i) U r(6')(C(^ b")), where C{< 0) = C. Define C(x b) := 
C7(^ b) \ C{b), C{h a):=C\ C{-< a), C7(^ a):=C\ C{< a), and C(a, b) := C{-< b) \ C[-< a). 

Above we talked about the distance between two points along the curve, which may be a 
somewhat counter-intuitive concept for a curve that can be refined and therefore lengthened 
indefinitely. However, the distance between two points p and q along the curve is well-defined 
as the area filled by the section of the curve that runs from p to g, or more precisely, as: 

\C{p,q)\:= lim min |C(a,6)|. 

fe-^oo a,b(^Nk S.t. p&C{a),q&C{b) 



3.2 Pairwise locality measures 

From the quality measures mentioned above, the most relevant and applicable to the con- 
struction of bounding-box hierarchies seem to be those that bound the (worst-case or average) 
distance between two points in the plane as a function of their distance along the curve. This 
is, intuitively, because points that lie close to each other along the curve are likely to be put 
together in a block. Then, if the distance between those points in the plane is small too, the 
block may have a small bounding box. 

Gotsman and Lindenbaum [lOj defined the following class of locality measures: 

lim ma. 

m— >oo l<i<j<m [j — i)/m 

where i and j are integers, S{i) is the ith square along the curve in a subdivision of the unit 
square into a regular grid of m squares, and dr{S, T) is the L^-distance between the centre point 
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{Sx, Sy) of S and the centre point {Tx,Ty) of T. Thus dr{S, T) = {\Sx — TxY' + \Sy~ TyYY^'^ for 
r G N, and doo{S, T) = max(|S'a; — Ta;|, \Sy — Ty\). We note that the measure is easily generahsed 
to scanning orders that are not based on regular grids, by defining it as: 

WL^ := hm sup \ . 

We call this measure WL^ for Worst-case Locality^ as it indicates for points that lie close to 
each other on the curve how far from each other they might get in the plane. Since we have 
di{p, q) > d2{p, q) > doo{p, q) for any pair of points p and q, we have WLi > WL2 > WLqo for 
any space-filling curve. 



3.3 Pairwise bounding box measures 

Intuitively, one may expect a relation between locality and bounding box size, as explained 
above. However, we may also try to measure bounding box size directly. We define two measures 
to do so. The first is the worst-case bounding box area ratio (WBA): 

..r^. 1. |bbox(C(a,6))| 
WBA:= hm sup ' \:. "\ 

where bbox(S') is the smallest axis-aligned rectangle that contains S. The second measure is 
the worst-case bounding box square perimeter ratio (WBP): 

WRP 1 1- peri(bbox(C(a,6)))^ 
16 a,beNk \C{a,b)\ 

where peri(S') is the perimeter of S in the L2 metric. Taking the square of the perimeter 
is necessary, because otherwise the measure would be unbounded as k (the resolution of the 
"grid") goes to infinity. Because the rectangle of smallest perimeter that has any given area is 
a square, the "ideal" bounding box has squared perimeter 16. The division by 16 gives an easy 
relation between WBP and WBA: we have WBP > j^(4\/WBA)^ = WBA. Furthermore, since 
the perimeter of the bounding box of two points p and q is simply twice their Li-distance, we 
have WBP > ^{iVWLi)'^ = \WLi. 

We can define measures similar to WBA and WBP when the bounding boxes used are not 
axis-parallel rectangles, but convex octagons whose sides have normals at angles of 0, ^n, ^tt, 
Itt, tt, jTT, Itt, and ^vr with the positive a;-axis. We call these measures worst-case bounding 
octagon area ratio (WOA) and worst-case bounding octagon square perimeter ratio (WOP). 
In the definition of WOP we still use the factor 16 to allow a direct comparison with WBP 
and WBA. Because the octagon of smallest perimeter that has area 1 has squared perimeter 
32/(1 + V2) f=i 0.828 • 16, we have WOP > 0.828 • WOA. 



3.4 Total bounding box measures 

Worst-case For our application we argued that the average query response time is related to 
the total area and perimeter of the bounding boxes formed by grouping data points according to 
a given scanning order. When the points are sufficiently densely distributed in the unit region, 
the gap in the scanning order between the last point of a group and the first point in the next 
group will typically be small. Thus the grouping practically corresponds to subdividing the 
complete unit region into curve sections, of which we store the bounding boxes. To assess the 
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Figure 3: Left: an approximation of a section S of the GP curve with |bbox(S')|/|S'| = WBA. 
Middle and right: we can cover an arbitrarily large part of the unit region with such worst-case 
sections. 



quality of the order, we can define the worst-case total bounding box area (TBA) as follows: 
TEA := lim sup f V |bbox(C(ai_i, ai))| ) , 

^^oo ai-;a2^...^a™_ieAffe J 

where qq is defined as and is defined as 0. Since the bounding box area of each section 
C(aj_i, aj) is at most WBA • |C(aj_i, ai)\ and the area of all sections together sum up to 1, the 
total bounding box area is clearly at most WBA. But in fact we can prove the following. 

Lemma 1 For any uniform scanning order, we have TBA = WBA. 

Proof: Consider a section S of the curve such that |bbox(S')| is equal to, or close to, WBA -151. 
Now consider a recursive subdivision, following the rules of the scanning order, of the unit region 
into a set D of m subregions (cells). Let S' be an approximation of S that consists of all the 
cells of D that are completely covered by S, and let D' be the remaining cells of D. Note that 
we can repeat the construction within each of the cells of D' (Figure [s]) . 

TBA > |bbox(5')l + \D'\/m ■ TBA > |bbox(5')| + (1 - \S\) ■ TBA. 

By choosing m big enough we can let |bbox(S")| be arbitrarily close to WBA - {SI. Thus we get: 

TBA > WBA • \S\ + (1 - |5|) • TBA, 

which solves to TBA > WBA. □ 



Average As we showed above, the worst-case total bounding box area is not very informative. 
Of greater practical relevance may be the average total bounding box area (ABA): 

(m 
V |bbox(C(ai_i,ai))| 
^ ' 
i=i 

where the average is taken over sets of m — 1 cutting points ai, Cm-i uniformly chosen from 
the unit region. The above equation may serve as a complete definition of the ABA measure for 
fixed m, but this is not completely satisfactory. Experimental results with the scanning orders 
described in this paper indicate that asymptotically, the measure does not grow or shrink with 
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m, but it exhibits a small fluctuation which repeats itself as m increases by a factor 3, 4 or 9 
(depending on the curve) — in other words, the measure tends to be periodic in log m. Therefore 
any fixed choice of m is likely to give an advantage to some scanning orders and a disadvantage 
to others. Therefore, we define the average total bounding box area more precisely as the 
above measure, averaged over a range of values of m, such that m is large enough and log m 
is uniformly distributed in a range that covers an integral number of periods of fiuctuation 
of the curve under consideration. We define an average total bounding octagon area (AOA) 
analogously. 

We could define an average total bounding box square perimeter in a similar way. However, 
we are ultimately interested in the average perimeter, not the square perimeter. We have to 
be more careful with the effect of m now: we cannot expect to keep roughly constant total 
bounding box perimeter as m increases. To cut up a unit region into m sections such that their 
total bounding box perimeter is minimum, we would have to cut it up into squares of area 1 /m 
each, and their total bounding box perimeter would be 4^/rn. Therefore the total bounding box 
perimeter should be considered relative to 4-y/m, and we define the square average relative total 
bounding box perimeter (ABP) as: 

ABP := lim ( avg^^^^^^ ( V peri(bbox(C(ai_i, aO)) 

In the above definition we still take the square in the end, to allow a direct comparison between 
ABP and WBP. The reader may verify that we must now have 1 < ABA < WBA and 
1 < ABP < WBP. 




3.5 Total diameter measures 

Because for some applications it may be interesting to keep the diameter of curve sections 
small [28] and because our software was easy to adapt to it, our results in Section |6] also include 
estimations of the square average relative total curve section diameter (ADqo), defined as: 

^avg„^^„2^ ^„^_jgjv,;^ ^^diamoo(C(ai_i,ai))j j , 

where diamoo('S') is the diameter of S in the Loo-metric. We also compute ADi: the same 
measure based on the Li-metric. 

4 Lower bounds 

4.1 Worst-case bounding box area 

Theorem 1 Any scanning order with a rule that contains a triangle has WBA > 2. 

Proof: The area of a bounding rectangle of any triangle A is at least twice the area of A. □ 

Theorem 2 Any scanning order based on recursively subdividing an axis-aligned rectangle into 
a regular grid of rectangles has WBA > 2. 

Proof: Consider a subdivision of the unit rectangle into a regular grid of m rectangles, following 
the rules of the scanning order recursively to the depth where a grid of ^yrn x ^yrn rectangles 
is obtained. We distinguish two cases: either there is a pair of rectangles that are consecutive 
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Figure 4: Comer rectangles in a grid. The smooth curve illustrates the order of the rectangles 
along the curve. In each corner rectangle, we marked the outer corner and we shaded the front 
part. 

in the scanning order and do not share an edge, or all pairs of consecutive rectangles share an 
edge. 

In the first case, the bounding box of such a pair contains at least four rectangles, and thus 
the curve section that covers that pair results in WBA > 2. 

In the second case we argue as follows. Let si, ...Sm be these rectangles in order along the 
space-filling curve. For each rectangle Sj (1 < i < m), let the edge of entry be the edge shared 
with Si_i, and for each rectangle Si (1 < i < m), let the edge of departure be the edge shared 
with Sj+i. Among rectangles S2, S3, Sm-i, we distinguish two types of rectangles: straight 
rectangles and corner rectangles. A straight rectangle is a rectangle whose edges of entry and 
departure are not adjacent. A corner rectangle is a rectangle Sj whose edges of entry and 
departure share a vertex — we call this vertex the inner corner of Si, and the opposite vertex is 
the outer corner of Si . The front part of Sj is the part of Sj that appears before the outer corner 
in the order. 

Now we number the corner rectangles ti,t2, ■■■,tk in the order in which they appear on 
the curve, let Pi,P2, ■■■,Pk be their outer corners, and fi, f2, fk be the areas of their front 
parts (Figure [4|. Note that any sequence of at least ^/m rectangles must include a corner 
rectangle, so A; > \/m. Consider the curve section from pi to Pi+2, for any i = l,2,...,fc — 2. 
Let the width of this section (by number of rectangles) be w, let the height be h, and let 
n > 3 be the number of rectangles from ti to inclusive. Observe that because there is 
exactly one corner rectangle between ti and ti+2, namely tj+i, we have w > 2, h > 2, and 
w + h = n + 1 (the +1 is because tj+i counts towards both w and h). Now the area of 
the curve section between pi and Pi+2 is n — 1 + /i+2 — fi, and the area of its bounding box 
is w • h > 2{n — 1). Hence we have WBA > 2{n — l)/(n — 1 + /j_|_2 — fi), or equivalently, 
fi+2 -fi> (2/WBA - l)(n - 1) > 2 • (2/WBA - 1). 

For the sake of contradiction, suppose WBA < 2. From the above we get /2i+2 — f2i ^ 
2 • (2/WBA - 1) for ah i G {1,2,3, ...,m'}, where m' is [^^/^\ - 1. Therefore 2/WBA - 1 < 

2^ Yl'iLiihi+2 - f2i) = ^{f2m'+2 - f2) < This must be true for any grid of rectangles 
that is obtained by refining the subdivision recursively, following the rules of the scanning order. 
So we must have limm-»oo (2/WBA — 1) = and thus limm-»oo WBA = 2. □ 
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Figure 5: Definition of tlie points iq, ii, 12, is and 24 on the boundary of a triangle, and a sketch 
of an order in which the space-fihing curve may visit them. 



4.2 Worst-case locality 

Niedermeier et al. [21] prove WL2 > WLqo > 31/2 for scanning orders that contain a section 
whose perimeter is an axis-ahgned square. The proof is based on defining six points on the 
boundary of the square that need to be visited by the curve. For each possible order in which 
these points may be visited, they add up the squares of the distances between each pair of 
consecutive points in the order. Thus they derive a lower bound on this sum that holds for all 
orders in which the points can be visited. For the Lqo- and L2-metuc this lower bound is at 
least 3 1/2 times the area of the square, and the lower bounds on WLqo and WL2 follow. 

Below we show how to apply this technique to triangular curve sections. Unfortunately it 
does not work that well for rectangular curve sections. But there our new proof technique of 
Theorem [2] comes to rescue, leading to better bounds. 

Theorem 3 Any scanning order with a rule that contains a triangle has WL2 > 4. 

Proof: Consider a triangle that constitutes a contiguous section of the curve. Let ii, Z2 and is 
be its three vertices, in the order in which they appear on the curve. Let io be the point on the 
edge iiis that appears before ^2 and is closest to is among such points. Let i4 be a point on the 
segment igis, arbitrarily close to io. Let w be the length of the edge iiis, and let h be the height 
of the triangle relative to this edge, that is, the distance between i2 and the line containing iiis, 
see Figure |5j 

The curve may pass through the points just defined in four different orders, it must be 

ion^2i3U, iiioi2i3U, ioiiiiiiis, or iiioi2i4i3- 

We will now analyse the last possibility in detail (the others can be checked in a similar 
way). Consider the first leg of this path. When going from ii to io, the area C(ii,io) filled by 
the curve must satisfy (i2(ii, io)/|C(ii, io)| < WL2, that is: 



|C(ii,io)| > dl 



(ii,io)/WL2. 



Similarly we get: 



|C(io,i2)| >di(io,i2)/WL2 > 

|C(i2,i4)| >(ii(i2,i4)/WL2 > 
|C(i4,i3)| >di(u,i3)/WL2 = 



/IVWL2, 

/iVWL2, 

- 2wd2{ii,io) + (ii(ii, io)) /WL2. 
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Adding these up we get: 



\C{ii,h)\ > {2h^ + -2wd2{ii,io) + 2dl{h,io)) /WL2 > {2h^ + w^/2) /WL2. 

Note that C{ii,io) U C(ioi*2) U C(i2,u) U 0(14,13) U 0(11,13) is at most the complete triangle, 
with \C(ii,i3)\ < hw/2. Thus we get hw/2 > (2h'^ + w'^/2)/WL2, which solves to WL2 > 
Ah/w + w/h > 4. 

The other possible orders of io,ii,i2,h and 14 can be analysed in a similar way, leading to 
the same result. □ 

Note that in fact we get WL2 > 4a + 1/a, where a is the height/width ratio that minimises 
the right-hand side. Somewhat surprisingly, this implies that optimal locality cannot be achieved 
with equilateral triangles: with a = they are subject to a lower bound of WL2 > 8/\/3, 
while the triangles of the Sierpihski-Knopp order, with a = 1/2, give WL2 = 4 [24]. 

The proof technique of Niedermeier et al. could also be modified for scanning orders that 
contain rectangular (but not necessarily square) sections. However, in that case the lower bound 
will drop below 3 (consider rectangles of aspect ratio \/5)- Below we show how to use the proof 
technique of Theorem [2] to get a lower bound of 4, not only for WL2 but also for WLqo- 

Theorem 4 Any scanning order based on recursively subdividing an axis-aligned rectangle into 
a regular grid of rectangles has WL2 > WLqo > 4. 

Proof: We follow the same approach as with Theorem |2] but to get a good bound on WLqo 
(not only WL2), we need to be a bit more careful in the definition of the corners. 

Consider a subdivision of the unit rectangle into a regular grid of rectangles, following the 
rules of the scanning order recursively to the depth where a grid of ^/m x ^/m rectangles is 
obtained. Let si,...Sm be these rectangles in the order in which the space-filling curve visits 
them. Note that each rectangle touches the next one, at least in a vertex, otherwise WL2 and 
WLoo would be unbounded. Assume that the height/width ratio of each rectangle is a > 1 
(otherwise we swap the coordinate axes). Within this proof, we define the width of a single 
rectangle to be 1/ ^/a, and so its height is ^/a and its area is 1 . 

Among rectangles 82,83, Sm-i, we distinguish three types of rectangles. 

A rectangle is straight when Sj_i, and Sj+i lie either in the same row, or in three 
different rows. 

A rectangle Sj is a corner rectangle when exactly one rectangle out of and Sj+i lies in 
the same row as Sj. The outer corner of is the vertex that lies farthest from and Sj+i; 
more precisely, if is in the same row as Sj, the outer corner is the vertex of Sj that touches 
neither the column of nor the row of Si+i; otherwise, that is, if Sj+i is in the same row 
as Si, the outer corner is the vertex of Sj that touches neither the row of nor the column 

of Sj+l. 

A rectangle Sj is a double corner rectangle when Si-i and Sj+i lie in the same row, but not 
in the same row as Sj — implying that the curve makes something of a U-turn in Sj. Such a 
rectangle Si has two outer corners, namely the vertices of Sj that do not touch or Sj+i. We 
distinguish a first corner and a second corner, by the order in which they appear in the scanning 
order between Sj_i and Sj+i (Figure [6|. 

Now number the corner rectangles ti,t2, ■■■,tk in the order in which they appear on the 
curve, with each double corner rectangle listed twice. Let pi,P2, ■■■,Pk be the outer corners of 
these rectangles. Where ti and ij+i are the two copies of a double corner rectangle, pi is the 
first corner and pi+i is the second corner. The front part of tj is the part of tj that appears 
before pj in the order. As before, we have k > ^/rn. 
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Pi 

Figure 6: Comers of a scanning order that is not edge-connected on a rectangular grid. Note 
how horizontal and mainly vertical sections alternate between the corners. Although both the 
predecessor and the successor of rectangle A lie to the left, it is not a double corner rectangle, 
since the path to the predecessor is horizontal and a path to a rectangle in another row (such 
as the successor ^4) is always considered to be vertical. Rectangles B and C are double corner 
rectangles. 



As before, we will argue about curve sections between two corners pi to Pi+2, for even i. In 
addition we assume that for even i, the rectangles ti and tj+i lie in the same row (if this is not 
the case, we could simply do the calculations given below for odd i instead of even i). 

For the sake of contradiction, assume that WLqo < 4. 

Let w > 1 he the number of rectangles in the order from ti to tj+i inclusive {w could be 1 
if ti = is a double corner rectangle), and h >2 the number of rectangles in the order from 
tj+i to inclusive). 

The Loo-distance between pi and pi+i is at least wj^fa^ hence we get 
WLoo > ^nP' l(w — 1 -|- /i+i — /j), which we rewrite as: 

The Loo-distance between pj+i and Pi^i is at least ^/ah. Hence we get: 

WLoo > —— ^, (2) 

which we rewrite as: 

Adding up Equations [T] and [3] we get: 

/«-/.>^!^^^if^- + 2). (4) 
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Furthermore Equation [2] gives us 4 > WLqo > ah? /{h — 1 + /j+2 — fi+i) > ah > h. From this 
we get that h must be either 2 or 3. In the case of /i = 2 Equation |4] becomes: 

, „ ^ w'^/a + Aa f w/a + Aa/w A ^ 4 

f,_L2 — fi > W = \ 1 ] W > 1. 

In the case of /i = 3 and w = 1 Equation |4] becomes: 

l/a + 9a 10 / 4 \ 4 
fi+2 - fi> — — 2 > 2 > 2 1 > 1. 

H^2 H - ^Loo - WLoo VWLoo ) WLoo 

In the case of /i = 3 and w >2 Equation |4] becomes: 

^ Iv^^-C-^' ^ ^WL^-'-^' - - ^ wt->- 

Thus we get /j_|_2 — fi > 4/WLoo — 1 in all cases. The proof now concludes as for Theorem |2] 
leading to the conclusion limm^oo WLoo = 4, which also implies WL2 > 4. □ 

Niedermeier et al. |[24j also proved WL2 > WLoo > 4, but for another class of scanning 
orders, namely those that (i) contain a section whose perimeter is an axis-aligned square and 
(ii) are cyclic, that is, the end of that section touches its beginning. Our proof does not 
need those conditions, but needs others. To prove WL2 > 4 we require that the curve has a 
triangular section, or that it has a rectangular section subdivided recursively into a regular grid 
of rectangles. 

Regarding Li-locality, Niedermeier et al. proved WLi > 8 if both conditions (i) and (ii) 
hold, and WLi > 6 1/2 if only condition (i) holds. We have no results to complement this: it 
seems hard to use our technique to prove any lower bound on WLi which is significantly better 
than 4. 



5 Approximating the worst-case measures 

In this section we describe how we can obtain upper and lower bounds on the quality measures 
such as the worst-case locality and the worst-case bounding box quality measures defined in Sec- 
tion [3j For ease of description, we assume that the scanning order is defined by a single recursive 
rule without reversals. The techniques described below can easily be extended to multiple-rule 
scanning orders or scanning orders with reversals (in fact that is what we implemented). 

Let /X be a mapping from regions to real numbers in a way that is invariant under all 
transformations T(i) involved in the recursive definition of the scanning order. For example, 
fj.{R) could be |bbox(i?)|/|i?|, or the square of the diameter of R divided by \R\. Our goal is 
to approximate fi* = limjt^oo supj_.jgjVfc (We may also let depend on C{i) and/or 

C{j).) The mapping /i must be well-defined when C{i,j) is not empty; when \C{i,j)\ = we 
may assume iJ,{C{i,j)) = 00. 

5.1 Representing curve sections by probes 

We will compute the approximation of fi* by exploring probes. A probe P is specified by 
three consecutive subsections of the order: a front section, a midsection, and a tail section. 
The probe P thus describes a set of contiguous subsections of the scanning order, namely all 
those subsections S that start somewhere in the front section of P and end somewhere in the 
tail section of P. For any probe P, let a{P) be the transformation that transforms C into 
the front section of P; let M{P) be the midsection of P; and let oo{P) be the transformation 
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1/2 1 3/2 1/2 1 3/2 



Figure 7: (a) The Hilbert order, (b) Base probe Bq2 of the Hilbert order. The front transforma- 
tion a{BQ2) consists of scahng with factor 1 /2 and a reflection in the hne x = y. The midsection 
M(Bq2) is the square [0, ^] X (the dark area in the figure). The tail transformation uj{Bq2) 
consists of scaling with factor 1/2 and a translation over (j, j). (c) Canonical form V of Bq2. 
The canonical form is obtained from Bq2 by applying the transformation a(i?02)~^) that is, 
reflection in the line x = y and scaling with factor 2. (d) Refinement i^2{'P) of V. (e) The 
same refinement in canonical form: child 7^32 of V. 

that transforms C into the tail section of P. A section P(i,j) of a probe P is the region 
a{P)C{>: i)UM{P)L)Lo{P)C{^ j). Let ij-*{P) be the maximum value of over all subsections 
S covered by the probe, that is, n*{P) = limf^^^sup^j^]^^ n{P{i, j)). A probe P may be 
rotated, mirrored, scaled and/or reversed: this does not affect the value of fi*{P). 

All subsections of the scanning order can be captured by a set of probes as follows. For 
< i < /c < n, let base probe Bik be the probe with front transformation T(i), midsection 
Ui<j<fc ^(j)) ^^cl tail transformation T{k). For an example, see Figure j7|a,b). 

Lemma 2 p* = maxo<i<fc<n 

Proof: For any x < fx* , let a, b be any pair such that a ~< b and fi{C{a, b)) > x. Now let c be 
the longest common prefix of a and b. Note that a and b have the same number of digits, so c 
must be a proper prefix of both a and b. Let 6 be the digit of a following the prefix c, and let b 
be the digit of b following the prefix c. Thus C(a) lies in subregion a of C(c), and C(6) lies in 
subregion b of C(c). Since fi is invariant under r(c), it follows that > fi(C{a,b)) > x. 

Thus, for any x < fi* , there are < i < k < n such that iJ,*{Bik) > x. This implies there are 
< i < k < n such that fi* < fi*(Bii.), which proves the lemma. □ 

Let refinement Tij{P) of probe P, with i,j G {0, ...,n — 1}, be the probe with front trans- 
formation Oi{P) o r(z), tail transformation iv{P) o T{j), and midsection a{P){C{>- i)) U M{P) U 
cj(-P)(C(^ j)); see Figure [7]^c,d). Since P = U%(^)) '^^ have ^i*{P) = max;U*(rjj(P)). 

We say a probe P is in canonical form if a{P) is the identity transformation. We can 
construct a canonical form V of any probe P by setting a('P) to the identity transformation, 
M{V) := a{P)~^{M{P)), and a;(P) := a{P)~^ o oj{P)\ see Figure [7];c,e). Since /x is invariant 
under all transformations involved, we have fJ'*{P) = fJ,*{V). Therefore it suffices to work only 
with probes in canonical form, where the children of a canonical probe V are the canonical forms 
of its refinements. So child Vij is the canonical probe with midsection M{Vij) := T{i)^'^{C{>- 
i)[jM{V)[Juo{V){C{-< j))) and tail transformation oj{Vij) := T{i)~^ o uj{V) o T{j) (Figure [7];e)). 
Observe that t{i)~^ always includes scaling with a factor greater than 1, so for any canonical 
probe V and any canonical child Vij we have \M{'Pij)\ > \M(V)\, unless i = n — 1, M(V) = 
and j = 0. 
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ComputeCurveQuality 

1 Q ^ an empty first-in-first-out queue 

2 i? <— an empty dictionary 

3 Insert the canonical forms of all base probes Bi^ in Q and in R 

4 lowerBound ^ max-pgg H~(V) 

5 while we do not like the gap between lowerBound and maxpgg /i+('P) 

6 do Extract a probe V from the head of Q 

7 for all canonical children Vij of V 

8 do if n'^iVij) > lowerBound and -R does not contain Vij 

9 then Add Vij to Q and R 

10 lowerBound ^ max(lowerBound, (Vij)) 

11 Report that /x* lies in the interval [lowerBound , max-p,zQ /i^('P)]. 



Figure 8: Algorithm to compute an approximation of a curve quality measure. 

Note that while computing ^*{V) may be difficult, it may be easy to get a lower bound 
H~{V) and an upper bound fJ-'^iV) on n*{V). For example, if fJ-{A) is defined as |bbox(74)|/|j4|, 
then \hhox{M{V))\/\C U M{V) Uuj{V){C)\ would be a lower bound on fi*{V), and |bbox(CU 
M{V)Uuj{V){C))\/\M{V)\ would be an upper bound on iJ,*{V) (provided \M{V)\ > 0, otherwise 
we define fJ-*{V) := oo). 

5.2 Searching probes 

Our general algorithm to approximate //* is shown in Figure [Sj The main idea of the 
algorithm is that it keeps replacing probes by their refinements to get tighter lower and upper 
bounds on n* . It is easy to see that the algorithm produces a correct lower bound. We will 
prove that the algorithm also produces a correct upper bound by proving that the following 
invariant holds after every iteration of the while loop: for every probe V that was ever added 
to the queue and has fJ-*{V) = n*, there is a descendant V' ofV'mQ such that fJ.*{V') = fj.* . 

In fact the algorithm would be trivial to prove correct if we would omit the check if Vij is 
contained in R on line [8} However, the algorithm would be useless, because the queue would 
continue to contain degenerate probes, that is, probes V with M{V) = 0. This is because some 
degenerate probes are inserted on line [s] (namely all base probes Bn. with k = i -|- 1), and 
whenever a degenerate probe V is extracted from the queue, its degenerate child Vn~i,o would 
be added to the queue. Since n'^iV) = oo for degenerate probes, the algorithm would never 
find an actual upper bound on fi*. 

Fortunately, it is easy to see that for most space-filling curves the algorithm can only generate 
a small number of different degenerate probes: typically for any degenerate canonical probe the 
transformation lo has scale factor one, rotations and reflections form a small closed set, and the 
translation is fixed because the tail section has to connect to the front section. Moreover, non- 
degenerate probes have only non-degenerate children. Therefore, making sure that no probe is 
inserted in Q more than once, guarantees that Q soon becomes and remains free of degenerate 
probes, and the algorithm soon finds an upper bound on We have to prove, however, that 
this upper bound is indeed correct. 

Theorem 5 Algorithm ComputeCurveQuality returns correct lower and upper hounds on fi* . 

Proof: We need to prove that, despite the fact that the algorithm refuses to insert probes in Q 
that were inserted before, the invariant still holds after every iteration: for every probe V that 
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was ever added to the queue and has IJ,*{V) = fi*, there is a descendant V of 7^ in Q such that 

For the sake of contradiction, let V be the probe that is removed from Q in the first iteration 
that violates the invariant and does not restore it by inserting children of V. Since IJ.*{V) = /J*, 
the probe V must have a child Vij with ji*{Vij) = /i* > lowerBound. If this child is not inserted 
in Q (restoring the invariant), it must be because R contains Vij already, which implies that at 
the beginning of the iteration Q contained a descendant V" of Vij with iJ,*{V") = li* . Since V" 
is also a descendant of V, the invariant that Q holds a descendant V' of V with i^*{V') = /x* 
can only be violated by also removing (at least) V" . But wc remove only one probe from the 
queue, so this implies V" = V, and wc have that Vij is a descendant of V and vice versa. Now 
as observed above, from canonical parent to canonical child the size of the midsection is either 
zero or strictly increasing. So if 7^ is a descendant of Vij, we must have M{V) = M{Vij) = 0, 
i = n — 1, and j = 0. 

Consider a full line of ancestry of Vij down from itself, that is, a sequence of canonical 
probes V°,...,V'', where V^ = Vij,V^~^ = V,V^ = V^, and each probe is a child of 

V^, ioT < m < k. We have ;U*('P™) = //* for all < m < k. We prove by induction on m 
that at the end of the current iteration, we have for all < m < that V^ is in R but not in 
Q. For V^ = Vij this is true because the violation of the invariant is caused by not inserting 
Vij in Q, which must be because it is in R already. Now, given that V^,...,^^ are in R but 
not in Q, consider 7?"*+^. The probe 7^"*+^ is in R, because V^ is in R and not in Q; when 
V"^ was removed from Q, its child T^'^+i, with iJ.*{V"^~^^) = /x* > lowerBound, must have been 
added to R if it was not in R already. But again V"^~^^ is not in Q, otherwise Q would contain 
a descendant of V and the invariant would not be violated. 

Therefore V^,...,V^ are all in R but not in Q, that is, they were all once added to the queue 
and have been extracted since. It follows that any non-degenerate children V* of V^,...,V^ 
with ^*{V*) = fi* were once inserted in R and Q, so Q must still contain descendants of any 
such non-degenerate children V*. The removal of V, a degenerate probe, did not change that, 
so if there was ever such a non-degenerate child V* with ii*{V*) = /x*, then a descendant V' 
of 7^*, and thus, of V, with n*{V') = n*, would still exist in Q and the invariant would not be 
violated. So we must conclude that for < m < A;, the degenerate probe 7^™-+^ is the only child 
V* of with ji*{V*) = /tx*. It follows that V in particular does not have any non-degenerate 
descendant probes V* with ii*{V*) = /x*. 

Now, for any value x < jx* , let a, h be any pair such that n{V{a, b)) > x. Let o, be a followed 
by a zero, and let 6 be 6 followed by a zero. Now wc have ^{V{a,b)) = ^{V{a,b)) > x, and 
a) / 0. The probe V* with midsection r(a)-^(C(>- a) U M{V) U uj{V){C{~< h))), and tail 
transformation T{a)~^ o u!{V) o T{b) , is a non-degenerate descendant probe of V with ii{V*) > x. 
Since we can find such a non-degenerate descendant probe of V for every x < ji* , this contradicts 
the conclusion of the previous paragraph that V does not have any non-degenerate descendant 
probes V* with ii*{V*) = ji* . 

Hence there cannot be such a probe V whose removal from Q leads to the invariant being 
violated at the end of an iteration. Therefore the invariant is maintained, and the algorithm is 
correct. □ 

5.3 Implementation 

To implement algorithm ComputeCurveQuality for a particular measure, one needs to come 
up with a representation of midsections of canonical probes V that enables a quick, correct, 
and reasonably tight approximation of IJ-*{V) from above and below. If the approximations are 
tight, the algorithm may zoom in on the worst-case sections of the curve quickly, keeping a 
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Figure 9: Expanding the ordering rule of the Sierpihski-Knopp order by one level results in 
rotations by multiples of 7r/2. 



small probe queue Q and refining only where necessary. If the approximations are not tight 
enough, the algorithm may degenerate into a search of a complete grid, taking long to converge 
or not converging at all. For an efficient operation of the algorithm the representation of the 
midsections should also be easy to maintain under the transformations in the rules that describe 
the scanning order. 

For some curves and measures a good implementation is easier to accomplish than for others. 
To compute WLqo, WL2, and WLi we only need to know the size (area) of the midsection; for 
WBA and WBP we also need to know the minimum rectangular bounding box of the midsec- 
tion; for WOA and WOP we need to know the minimum octagonal bounding box. For curves 
that only use axis-parallel reflections and rotations with angles of ivr, vr and |7r, these mid- 



section properties can be maintained easily. Fortunately, most curves presented in Section 2.2 



fulfill these requirements. However, the Sierpihski-Knopp order and the Gosper flowsnake use 
rotations that are not multiples of 7r/2, and as a result the WLqo, WLi, WBA, WBP, WOA 
and WOP measures are not invariant under rotation as required by the algorithm. 

For the Sierpihski-Knopp order this is easily solved by expanding the definition by one level 
of recursion, see Figure [9j 

For the Gosper flowsnake we observe that WL2 is invariant under rotation and can still 
be computed by our algorithm. Because the rotations of the curve are not a rational fraction 
of vr, any pair of points that defines the worst-case L2-locality will appear scaled down and 
turned arbitrarily close to horizontal somewhere in the curve, so that we have WLqo > WL2. 
Since the L2-distance between any pair of points is at least their Loo-distance, it follows that 
WLoo = WL2. Likewise, any pair of points that defines the worst-case L2-locality will appear 
scaled down and turned arbitrarily close to making a 45 degree angle with the horizontal, so that 
we have WLi > 2WL2 and WBP > WBA > WL2/2. Since the squared Li-distance between 
any pair of points is at most twice their squared L2-distance, it follows that WLi = 2WL2. For 
the computation of WBA, WBP, WOA and WOP similar arguments could be used, provided 
the boundaries of unit regions and midsections are represented in such a way that one can 
determine good lower and upper bounds on the maximum, over all possible rotations, of the 
size of the minimum bounding box. This seems hard to accomplish due to the fractalic nature 
of the region boundaries. 



6 Computational results 

Computing pairwise worst-case measures We implemented the approximation algorithm 
described above, specifically to compute the worst-case locality measures WLoo, WL2, and WLi, 
the worst-case bounding box quality measures WBA and WBP, and the corresponding measures 
for bounding octagons WOA and WOP. We ran our algorithm on the curves mentioned in 



Section 2.2, except that we did not try to compute WBA, WBP, WOA and WOP for the 



Gosper flowsnake, because of the reasons explained in Section 5.3 

The running times of the computations varied with the curves. When the algorithm suc- 
ceeded in having a probe queue of constant size, we would have an approximation with precision 
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0.0001 within a fraction of a second (higher precision being prevented by the number of bits 
used to represent numbers); when the probe queue kept growing, the computation could take a 
few minutes and the precision would be limited by memory requirements. WBA, WBP, WOA 
and WOP were always computed fast and precise; WLi was fast and precise for all orders except 
Sierpiiiski-Knopp order; WL2 and WLqo were generally computed only to a precision of 0.0005. 

We tested the 278 different orders that fit Wunderlich's scheme and are defined on a 2 x 2- 
grid (that is, Hilbert order), a 3 x 3-grid (R-order and the 272 Serpentine orders), or a 4 x 4-grid 
(4 different orders, not counting Hilbert order). From these orders five curves C turned out to 
be "dominant" in the sense that there was no curve that was better than C on at least one 
measure and at least as good as C on the other measures. These five curves are the Hilbert 
order (best on WLi, WBP and WOP), Serpentine 000 000 000 (that is, GP-order, best on WBA 
and WOA), Serpentine 011010110 and 101010101 (with equal scores, not best on anything 
but not dominated either), and Serpentine 110110110 (Meurthe, best on WL2 and WLqo). 

Computing average total measures We examined the five dominant curves further, to- 
gether with coil order, balanced GP-order, /?0-order, AR^W^-oidei, Z-order and Sierpihski- 
Knopp order. For these curves we estimated the average total bounding box area and the 
square average relative total bounding box perimeter (also for bounding octagons) by random 
sampling as follows: we generated 100 sets of numbers chosen uniformly between and 1 that 
subdivide the curve, where the logarithm of the size of each set was chosen uniformly between 
log 500 and log 18 000. Thus we cover an integral number of periods of fluctuation for each 
scanning order. We estimated the average total area (or relative perimeter) by taking the av- 
erage over these 100 sample subdivisions. Because in some applications it is useful to keep the 
-Li-diameter of curve sections small and our software was easy to adapt to it, we used the same 
method to also estimate the square average relative total curve section diameter in the L^o 
metric (ADqo) and in the Li metric (ADi). 

Results The results of our computations are shown in Table [T] Note that for some scanning 
orders the exact worst-case locality measures were already known: tight lower and upper bounds 
for the Hilbert order were proven one by one in several papers: lower bounds on WLqo, WL2, 
and WLi by Alber and Niedermeier [1], Gotsman and Lindenbaum [10], and Niedermeier and 
Sanders [25], respectively; upper bounds by Bauman [5], Bauman [5], and Chochia et al. [6], 
respectively. Tight bounds for GP-order, coil order (Luxburg 1), and Luxburg 2 were proven 
by Luxburg [T7], and tight bounds for Sierpihski-Knopp order were proven by Niedermeier et 
al. [23^, confirming earlier observations on WL2 [IF (with respect to these measures, the H-order 
described by Niedermeier et al. is equivalent to Sierpihski-Knopp order and to Cesaro's variant 
of the Von Koch curve as mentioned by Mandelbrot [E]). Our computations confirmed all of 
these results. The other bounds have been computed by us. The bounds on the W-measures are 
the average of lower and upper bounds which have a gap of at most 0.0005; all printed numbers 
are less than 0.001 off from the real values. Only the bounds for the Gosper flowsnake are less 
precise (this order involves rotations by angles of arctan g\/3, which makes it more challenging 
to get bounds with high precision). 

The bounds on the A-measures result from our experiments with random subdivisions of 
curves. We omit the results for ADi, because for all curves we found that 2ADoo < ADi. 
This means that the best total Li-diameter is obtained by rotating the curve by 45 degrees, so 
that the square total Li-diameter becomes twice the original square total Loo-diameter. The 
Sierpihski-Knopp order was the only one not affected by the rotation. 

Regarding worst-case locality in the L^o- and L2-metrics, we see that the best order in Wun- 
derlich's scheme had not yet been found: Meurthe (Serpentine 110110110) turns out to have 
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Order 


WLcoWLa WLi 


WBA ABA 


WBP ABP 


WOA AOA 


WOP 


AD^ 


Sierpinski-Knopp order 
Balanced GP 


4 4 8 
4.619 4.619 8.619 


3.000 1.78 
2.000 1.44 


3.000 1.42 
2.155 1.19 


1.789 1.25 
1.769 1.31 


1.629 
1.807 


1.77' 
1.72' 


GP (Serp.OOOOOOOOO) 
Serpentine Oil 010 110 
Ltixburg 2 (101 010 101) 
Meurthe (110110110) 

Coil (Serp. 111111111) 


8 8 102/3 
5.625 6.250 10.000 
55/8 6 1/4 10 
5.333 5.667 10.667 
62/3 62/3 102/3 


2.000 1.44 
2.500 1.44" 
2.500 1.49' 
2.500 1.41" 
2.500 1.41' 


2.722 1.28 
2.500 1.20 
2.500 1.24 
2.667 1.17 
2.667 1.17 


1.835 1.32 
2.222 1.32' 
2.222 1.35' 

2.000 1.30' 
2.222 1.29' 


2.395 
2.036 
2.036 
2.018 
2.424 


2.13' 
1.71' 
1.81' 
1.64' 
1.63' 


Hilbert 

f3n 

Z-order 


6 6 9 
5.000 5.000 9.000 
5.400 6.046 12.000 

00 00 00 


2.400 1.44 
2.222 1.42 
3.055 1.49' 
00 2.92 


2.400 1.19 
2.250 1.17 
3.125 1.22 
00 2.40' 


1.929 1.30 

1.800 1.29 
2.344 1.33 
cxD 2.46 


1.955 
1.933 
2.255 

00 


1.67' 
1.64' 
1.70' 
3.80" 


Gosper flowsnake 


6.35 6.35 12.70 


>3.18 


>3.18 









Table 1: Bounds for different measures and curves. New curves printed in bold. For the A- 
measures the standard deviation is indicated behind the number: no symbol when less than 
0.5%; one mark when between 0.5% and 1.0%, two marks when between 1.0% and 2.0%. 



even better locality in these measures than Luxburg's second variant (Serpentine 101 010 101). 
Even better locality is achieved by Wierum's /3f2-curve (matching or improving on Hilbert 's 
curve in all measures) and still better by our new Pcano variant: balanced GP. The latter 
approaches the locality of the Sicrpiiiski-Knopp order, which is still conjectured to be optimal. 

However, it appears that the optimal locality of the Sierpihski-Knopp order comes at a 
price: it results in high worst-case bounding box measures, and in our experiments on random 
subdivisions the resulting bounding boxes are about 25% worse than with most other orders. 
Only Z-order, which tends to result in bounding boxes twice as big as with the other orders, 
performs worse. The best performer on the worst-case bounding box measures is our balanced 
GP-order, which also performs well in the experiments on random subdivisions (similar to 
Hilbert), but coil order, Meurthe order and /3f2-order perform even better in the experiments. 

We also computed worst-case and average total bounding box measures for rectangular 
bounding boxes with edges at 45 degrees' angles with the coordinate axes. This was (very) 
harmful with all scanning orders except Sierpihski-Knopp order, where it had no effect (this 
was to be expected, because the definition of Sierpihski-Knopp order involves rotations by all 
multiples of 45 degrees). The Sierpihski-Knopp order benefits more than any other from using 
octagonal bounding boxes instead of rectangular bounding boxes, as we can see in the right 
columns: here Sierpihski-Knopp order is the best performer. However, all things considered 
the advantages of bounding octagons may not be worth the effort. Such octagonal bounding 
boxes need twice the description size of rectangular bounding boxes, but the savings are small: 
in total bounding octagons constructed by Sierpihski-Knopp order are only 13% smaller than 
bounding rectangles constructed by Hilbert or GP-order. 

7 Conclusions 

Pairwise worst-case measures Known locality measures of space-filling curves do not pre- 
dict well how effective they are when used to group points into bounding boxes. Therefore we 
proposed new measures of bounding-box quality of space-filling curves. We presented new scan- 
ning orders that perform well on these measures, most notably the balanced GP-order, which 
has worst-case bounding box area ratio (WBA) 2.000, and worst-case bounding box square 
perimeter ratio (WBP) 2.155. On worst-case locality measures this curve also scores very well, 
much better than Peano's original curve, and beaten only by Sierpihski-Knopp order. 
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R: 




Figure 10: Example of a scanning order based on rectangles, but not in a standard grid pattern. 
The locality and bounding-box quality of this particular scanning order are not very interesting: 
WLoo = 6.000, WL2 = 6.667, WLi = 12.656, WBA = 3.125, WBP = 3.164. 



We conjecture that a worst-case bounding box area ratio (WBA) of 2 is in fact optimal 
and cannot be improved by any (recursively defined) space-filling curve. More provocatively we 
conjecture that the optimal worst-case bounding box square perimeter ratio (WBP) is also 2 
(note that we have not actually found a curve with WBP less than 2.155). We add these 
conjectures to those by Niedermeier et al., who conjectured that the optimal WLoo, WL2, and 
WLi locality values are 4, 4, and 8, respectively (Niedermeier et al. posed this conjecture for 
curves filling a square, but we would like to drop this restriction). 

For proving the lower bounds in these conjectures we have come a long way. Niedermeier et 
al. proved tight lower bounds on the worst-case locality for a certain class of space-filling curves. 
Unfortunately, strictly speaking almost none of the space-filling curves in our study belongs to 
that class. For L2-locality and for worst-case bounding-box area and squared perimeter, we 
managed to prove the conjectured lower bounds for another class of space-filling curves, partly 
overlapping with the class covered by Niedermeier et al., and now including almost all space- 
filling curves mentioned in this paper. 

Still we have not been able to prove these lower bounds for all space-filling curves. Our 
proof is restricted to scanning orders based on axis-aligned rectangles in a regular grid pattern 
and to scanning orders based on triangles. In this paper we mentioned the Gosper flowsnake 
curve, which is based on a subdivision of a unit region into subregions that all have fractalic 
boundaries, and is therefore not included in the class of scanning orders to which our lower 
bounds apply. Of course one may argue that from a practical point of view it is questionable 
if one would like to use such a curve anyway, but we can also come up with scanning orders 
based on tiling the plane with rectangles — but not in a standard grid pattern — or L-shapes, 
for example (see Figure 10). Our bounds do not apply to such curves, and the bounds by 
Niedermeier et al. do not necessarily apply either. 

Regarding lower bounds on Li-locality we did not make any progress on Niedermeier et al. 



Performance in practice(?) Our experiments on random points may give an impression 
of how effective the different curves would be in the application considered in this paper: a 
data structure for points in the plane, based on sorting the points into blocks of points that are 
consecutive along the curve. We see that it would be clearly suboptimal to use the order with 
the best WLoo,WL2 and WLi locality (Sierpihski-Knopp order) for this application. It seems 
to be better indeed to choose a curve based on WBA and WBP (balanced GP-order). 

Still the WBA and WBP measures do not predict performance on random points perfectly 
either: there are several curves with only moderate WBA and WBP values that seem to be 
as effective as the balanced GP-order (for example Hilbert order) or even slightly better (for 
example coil order or pQ-oidei) on random point data. It should also be noted that total 
bounding-box area and perimeter may not be the only factors that determine the performance 
of a curve in a data structure setting. Asano et al. 0] argued that in certain settings it would be 
good if any axis-parallel square query window can be covered by a small number of contiguous 
sections of the curve that together cover an area of at most a constant times that of the 
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query window. It would be interesting to have an algorithm that can analyse a given curve 
automatically to compute the constants involved, in the worst case and on average. 

Higher dimensions For what the WBA and WBP measures are worth, the conjectured 
near-optimality of the balanced GP-order suggests that there is little room for hope to find 
significantly more effective scanning orders in two dimensions. A first topic for further research 
is to determine the gap between our lower bound constructions and the performance of known 
space- filling curves when we consider generalisations to three dimensions. Chochia and Cole [61 
and Niedermeier et al. [23] have some results on locality, but the gap is large and the field is 
still wide open, especially with respect to bounding box quality. 

Still higher dimensions can be interesting; four- dimensional space-filling curves can be par- 
ticularly interesting to order rectangles in the plane (which are specified by four coordinates 
each) [21 [TTl [15] . A first challenge in that context is to define appropriate quality measures that 
say something sensible about the quality of bounding boxes in two dimensions that are formed 
by grouping points in four dimensions. 
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